The vocabulary of a continuous speech recognition (CSR) system is asignificant factor in determining its performance. In this paper, we presentthree principled approaches to select the target vocabulary for a particulardomain by trading off between the target out-of-vocabulary (OOV) rate andvocabulary size. We evaluate these approaches against an ad-hoc baselinestrategy. Results are presented in the form of OOV rate graphs plotted againstincreasing vocabulary size for each technique.
展开▼